Multimodal image-text models have shown remarkable performance in the past few years. However, evaluating their robustness against distribution shifts is crucial before adopting them in real-world applications. In this paper, we investigate the robustness of 9 popular open-sourced image-text models under common perturbations on five tasks (image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation). In particular, we propose several new multimodal robustness benchmarks by applying 17 image perturbation and 16 text perturbation techniques on top of existing datasets. We observe that multimodal models are not robust to image and text perturbations, especially to image perturbations. Among the tested perturbation methods, character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data. We also introduce two new robustness metrics (MMI and MOR) for proper evaluations of multimodal models. We hope our extensive study sheds light on new directions for the development of robust multimodal models.
translated by 谷歌翻译
由于分布式概括是一个普遍不足的问题,因此在不同的研究计划中研究了各种代理目标(例如,校准,对抗性鲁棒性,算法腐败,跨轮班的不变性),导致不同的研究计划,从而提出不同的建议。在共享相同的抱负目标的同时,这些方法从未在相同的实验条件下对真实数据进行测试。在本文中,我们对以前的工作进行了统一的看法,突出了我们经验解决的消息差异,并提供有关如何衡量模型鲁棒性以及如何改进它的建议。为此,我们收集了172个公开可用的数据集对,用于培训和分布外评估准确性,校准错误,对抗性攻击,环境不变性和合成腐败。我们从九个不同的架构中的九个不同的架构中微调了31k网络。我们的发现证实,分布的精度往往会共同增加,但表明它们的关系在很大程度上取决于数据集依赖性,并且通常比以前较小的规模研究所提出的更加细微和更复杂。
translated by 谷歌翻译
最近,深度学习中的不确定性估计已成为提高安全至关重要应用的可靠性和鲁棒性的关键领域。尽管有许多提出的方法要么关注距离感知模型的不确定性,要么是分布式检测的不确定性,要么是针对分布校准的输入依赖性标签不确定性,但这两种类型的不确定性通常都是必要的。在这项工作中,我们提出了用于共同建模模型和数据不确定性的HETSNGP方法。我们表明,我们提出的模型在这两种类型的不确定性之间提供了有利的组合,因此在包括CIFAR-100C,ImagEnet-C和Imagenet-A在内的一些具有挑战性的分发数据集上优于基线方法。此外,我们提出了HETSNGP Ensemble,这是我们方法的结合版本,该版本还对网络参数的不确定性进行建模,并优于其他集合基线。
translated by 谷歌翻译
对不确定度和鲁棒性的高质量估计对于众多现实世界的应用来说至关重要,特别是对于深入学习,这是利用许多部署的ML系统。因此,比较改善这些估计的技术的能力对于研究和实践相似非常重要。然而,由于一系列原因,通常缺乏方法的竞争比较,包括:计算广泛调整的可用性,加入足够多的基线,以及用于再现性的具体文件。在本文中,我们介绍了不确定性的基线:在各种任务中的标准和最先进的深度学习方法的高质量实现。从本撰写中,集合跨越9项方法,每个方法都有至少5个度量。每个基线都是一个独立的实验管道,易于可重复使用和可伸缩的部件。我们的目标是提供具有新方法或应用的实验的即时出发点。此外,我们还提供模型检查点,实验输出为Python笔记本,以及用于比较结果的排行榜。代码在https://github.com/google/uncertainty-baselines。
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
In this paper, we present a novel visual SLAM and long-term localization benchmark for autonomous driving in challenging conditions based on the large-scale 4Seasons dataset. The proposed benchmark provides drastic appearance variations caused by seasonal changes and diverse weather and illumination conditions. While significant progress has been made in advancing visual SLAM on small-scale datasets with similar conditions, there is still a lack of unified benchmarks representative of real-world scenarios for autonomous driving. We introduce a new unified benchmark for jointly evaluating visual odometry, global place recognition, and map-based visual localization performance which is crucial to successfully enable autonomous driving in any condition. The data has been collected for more than one year, resulting in more than 300 km of recordings in nine different environments ranging from a multi-level parking garage to urban (including tunnels) to countryside and highway. We provide globally consistent reference poses with up to centimeter-level accuracy obtained from the fusion of direct stereo-inertial odometry with RTK GNSS. We evaluate the performance of several state-of-the-art visual odometry and visual localization baseline approaches on the benchmark and analyze their properties. The experimental results provide new insights into current approaches and show promising potential for future research. Our benchmark and evaluation protocols will be available at https://www.4seasons-dataset.com/.
translated by 谷歌翻译
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.
translated by 谷歌翻译
Passive monitoring of acoustic or radio sources has important applications in modern convenience, public safety, and surveillance. A key task in passive monitoring is multiobject tracking (MOT). This paper presents a Bayesian method for multisensor MOT for challenging tracking problems where the object states are high-dimensional, and the measurements follow a nonlinear model. Our method is developed in the framework of factor graphs and the sum-product algorithm (SPA). The multimodal probability density functions (pdfs) provided by the SPA are effectively represented by a Gaussian mixture model (GMM). To perform the operations of the SPA in high-dimensional spaces, we make use of Particle flow (PFL). Here, particles are migrated towards regions of high likelihood based on the solution of a partial differential equation. This makes it possible to obtain good object detection and tracking performance even in challenging multisensor MOT scenarios with single sensor measurements that have a lower dimension than the object positions. We perform a numerical evaluation in a passive acoustic monitoring scenario where multiple sources are tracked in 3-D from 1-D time-difference-of-arrival (TDOA) measurements provided by pairs of hydrophones. Our numerical results demonstrate favorable detection and estimation accuracy compared to state-of-the-art reference techniques.
translated by 谷歌翻译
Location-aware networks will introduce new services and applications for modern convenience, surveillance, and public safety. In this paper, we consider the problem of cooperative localization in a wireless network where the position of certain anchor nodes can be controlled. We introduce an active planning method that aims at moving the anchors such that the information gain of future measurements is maximized. In the control layer of the proposed method, control inputs are calculated by minimizing the traces of approximate inverse Bayesian Fisher information matrixes (FIMs). The estimation layer computes estimates of the agent states and provides Gaussian representations of marginal posteriors of agent positions to the control layer for approximate Bayesian FIM computations. Based on a cost function that accumulates Bayesian FIM contributions over a sliding window of discrete future timesteps, a receding horizon (RH) control is performed. Approximations that make it possible to solve the resulting tree-search problem efficiently are also discussed. A numerical case study demonstrates the intelligent behavior of a single controlled anchor in a 3-D scenario and the resulting significantly improved localization accuracy.
translated by 谷歌翻译
This paper presents an introduction to the state-of-the-art in anomaly and change-point detection. On the one hand, the main concepts needed to understand the vast scientific literature on those subjects are introduced. On the other, a selection of important surveys and books, as well as two selected active research topics in the field, are presented.
translated by 谷歌翻译